Performance Through Consistency: MS-TDNN's for Large Vocabulary Continuous Speech Recognition

نویسندگان

  • Joe Tebelskis
  • Alexander H. Waibel
چکیده

Connectionist Rpeech recognition systems are often handicapped by an inconsistency between training and testing criteria. This problem is addressed by the Multi-State Time Delay Neural Network (MS-TDNN), a hierarchical phonf'mp and word classifier which uses DTW to modulate its connectivit.y pattern, and which is directly trained on word-level targets. The consistent use of word accuracy as a criterion during bot.h t.raining and testing leads to very high system performance, even wif II limited training dat.a. Until now, the MS-TDN N has been appli('d primarily to small vocabulary recognition and word spotting tasks. In this papf'f we apply the architecture to large vocabulary continuous speech recognition, and demonstrate that our MS-TDNN outperforms all ot,hf'r systems that have been tested on tht' eMU Conference Registration database.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting

Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...

متن کامل

On designing pronunciation lexicons for large vocabulary, continuous speech recognition

Creation of pronunciation lexicons for speech recognition is widely acknowledged to be an important, but labor-intensive, aspect of system development. Lexicons are often manually created and make use of knowledge and expertise that is difficult to codify. In this paper we describe our American English lexicon developed primarily for the ARPA WSJ/NAB tasks. The lexicon is phonemically represent...

متن کامل

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...

متن کامل

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...

متن کامل

Two Pass Hidden Markov Model for Speech Recognition

1 Abstract This paper is an approach to increase the effectiveness of Hidden Markov Models (HMM) in the speech recognition field. The goal is to build a large vocabulary isolated words speech recogniser. The model, that we are dealing with, is of continuous HMM type (CHMM). The topology selected is the left-right one as it is quite successful in speech recognition due to its consistency with th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1992